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AUXILIARY INFORMATION GENERATION METHOD, 
AUXILIARY INFORMATION GENERATION APPARATUS , 
VIDEO DATA GENERATION METHOD, VIDEO DATA PLAYBACK METHOD, 
VIDEO DATA PLAYBACK APPARATUS, AND DATA STORAGE MEDIUM 
FTF.LP OF THE INVENTION 

The present invention relates to an auxiliary information 
generation method, an auxiliary information generation apparatus, 
an image data generation method, and a data storage medium. More 
^ particularly, the invention relates to a method and an apparatus 
[% for generating auxiliary information which is used as index 
% information when extracting or selecting a part of digital data 
Z\ such as digital video and audio data, and a method for generating 

partial video data by extracting a desired portion of image data 
0 by utilizing the auxiliary information, as well as a method and 
111 an apparatus for playing the partial video. Further, the 
O invention relates to a data storage medium which stores a program 
for making a computer execute the auxiliary information 
generation method and the video data generation method, and data 
generated as the result of executing these methods. 

ft ftCKGROUND OF T Hg INVENTION 

In recent years, with the progress in digitization of video 
and audio, standardization of video and audio data compression 
methods such as MPEG- 2 and MPEG-4 has been achieved for the 
purpose of improving efficiency in recording or transmission and, 
furthermore, standardization relating to description of auxiliary 



information, which is used when selecting desired data from a 
database holding these video and audio data or extracting a 
portion of the video data, has been promoted as MPEG-7. 

Hereinafter/ an example of description of auxiliary 
information relating to digital data based on MPEG-7 will be 
described with reference to drawings (1S0/IEC JTC 1 /SC 29/WG 
11/N3411, "MPEG-7 Multimedia Description Schemes WD (Version 
3.0) 2000.5) . 

As for viewing of video and audio data, it has been common 
practice that contents of video and audio data produced by a 
producer are broadcast and viewed by many people- However, as 
mobile devices such as personal computers and handy phones have 
become widespread, these mobile devices have permitted the users 
to interactively operate video and audio data through the 
internet or the like, as well as to view these data one-sidedly. 
Accordingly, it is expected that service patterns, which enable 
the users to retrieve only a desired portion of contents from 
data bases scattered on the Internet without viewing all of the 
contents, or enable the providers to select contents according to 
preferences of the users and distribute the contents to the users, 
will become widespread in the future. MPEG-7 is an international 
standard, which is now under standardization, for describing 
temporal information, contents, preferences ot users, and the 
like with respect, to multimedia data such as video and audio. 

Figure 50 shows an example of description based on MPEG-7 



(excerpt from ISO/IEC JTC 1/SC 29/WG 11/N3410, "MPEG-7 Multimedia 
Description Schemes XM (Version 3.0)", 2000-5). In this example, 
a portion of video data is described by a description of 
VideoSegment , and the VideoSegment corresponds Lo one scene. 
Auxiliary information is described by using MediaTimePoint 
indicating the start time of this segment, MediaDuration 
indicating the duration of this segment, and Segment Decomposition 
indicating the presence or absence of a gap between segments, and 
the like* As shown in figure 51, this auxiliary information 3003 
is added to a header 3002 or the like of video data 3000, whereby 
the user can easily search for or extract desired data such as 
video data. In figure 51, 3001 denotes a data body corresponding 
Lo the video data 3000 excluding the header 1002. 

In MPEG-7, however, only the description itself of the 
auxiliary information relating to contents information is 
standardized, and a method for generating the auxiliary 
information is not defined . Further, there is no definition 
about what kind of information is to be provided using MPEG-7. 

Meanwhile, with respect to mobile devices such as handy 
phones which have rapidly become widespread or progressed in 
functions, it is expected that those provided with shooting 
functions such as cameras or movies will become available 
inexpensively in the future. In this case, a shot (moving) 
picture can be transmitted to a destination through a mobile 
communication network. On the other hand/ the telephone charge 
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on such video communication is not necessarily a fixed charge 
such as a month-by-month basis* In the case of mobile terminals, 
a pay-per-view basis according to the amount of transmitted/ 
received data is usually employed. Therefore, if the whole of a 
shot (moving) picture is transmitted as it is, the communication 
cost will become high. 
SUMMARY OF THE INVENTION 

The present invention is made to solve the above-described 
problems and has for its object to provide a method and an 
apparatus for generating auxiliary information relating to 
digital data, a method for generating video ddla, which can 
reduce the cost of transmitting a picture from a mobile terminal, 
a method and an apparatus for playing a part of video data having 
a high degree of importance, and a data storage medium which 
holds a program for executing these methods as well as data 
obtained as the result of executing these methods* 

Other objects and advantages of the invention will become 
apparent from the detailed description thaL follows. The 
detailed description and specific embodiments described are 
provided only for illustration since various additions and 
modifications within the scope of the invention will be apparent 
to those of skill in the art from the detailed description. 

According to a first aspect of the present invention, there 
is provided an auxiliary information generation method 
comprising; generating auxiliary information relating to digital 
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data when the digital data is generated; and recording or 
transmitting the generated auxiliary information together with 
the digital data. Therefore / auxiliary information can be easily 
generated at a desired generation timing which is not defined in 
the standard. 

According to a second aspect of the present invention, in the 
auxiliary information generation method of the first aspect, a 
candidate of auxiliary information to be generated is selected; 
and auxiliary information corresponding to the selected candidate 
is generated. Therefore, auxiliary information can be easily 
generated at a desired generation timing which is not defined in 
the standard . 

According to a third aspect of the present invention, in the 
auxiliary information generation method of the first aspect, 
generation of the auxiliary information is carried out in 
synchronization with the start or completion of one of inputting, 
recording, and transmission of the digital data. Therefore, 
auxiliary information can be easily generated at a desired 
generation timing which is not defined in the standard. 

According to a fourth aspect of the present invention, in the 
auxiliary information generation method of the first aspect, 
generation of the auxiliary information is carried out with, as a 
trigger, user operation performed on a data generation apparatus 
which generates the digital data* Therefore, auxiliary 
information can be generated at a desired generation timing which 
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is not defined in the standard, 

According to a fifth aspect ot the present invention, in the 
auxiliary information generation method of the first aspect, the 
digital data is video and audio data; and the auxiliary 
information includes any of temporal information contents , and 
degree of importance of the video and audio data. Therefore , any 
of temporal information, title, and degree of importance, which 
are useful as indexes, can be used as auxiliary information which 
is information for searching video and audio data. 

According to a sixth aspect of the present invention, there 
Is provided an auxiliary information generation apparatus for 
generating auxiliary information relating to digital data when 
the digital data is generated, and recording or transmitting the 
digital data and the auxiliary information. Therefore, auxiliary 
information can be easily generated at a desired generation 
timing which is not defined in the standard. 

According to a seventh aspect of the present invention, the 
auxiliary information generation apparatus of the sixth aspect 
comprises a CPU which is included in a data generation apparatus 
for generating the digital data. Therefore, auxiliary 
information can be easily generated at the end of the data 
generation apparatus which generates digital data. 

According to an eighth aspect of the present invention, in 
the auxiliary information generation apparatus of the seventh 
aspect, wherein the data generation apparatus includes a diaplay 



means for displaying the digital data, and the CPU includes: a 
menu display means for displaying, on the display means, a menu 
of auxiliary information which is to be selected by the user of 
the data generation apparatus; a model selection means for 
selecting a model of auxiliary information according to options 
in the menu selected by the user; and a model rewriting means for 
rewriting parameter sections in the selected model according to 
an instruction from the user. Therefore, the user can generate 
auxiliary information according to a menu display, resulting in a 
user-friendly auxiliary information generation apparatus* 

According to a ninth aspect of the present invention, there 
is provided a video data generation method comprising: reducing 
the length ol digital data including video on the basis of 
auxiliary information relating to the digital data, thereby 
generating reduced digital data; and recording or transmitting 
the reduced digital data. Therefore, reduced digital data can be 
generated considering not only the auxiliary information but also 
the time required for transmission or the capacity required for 
recording, and important video data can be generated using the 
generated auxiliary information, whereby the communication cost 
and the capacity required for recording are reduced. 

According to a tenth aspect of the present invention, in the 
video data generation method of the ninth aspect, the reduced 
digital data is generated by preferentially extracting digital 
data having a high degree of importance, on the basis of the 
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auxiliary information. Therefore, it is possible for a produce 
of data to generate reduced digital data comprising only 
important segments for the producer. 

According to an eleventh aspect of the present invention, in 
the video data generation method of the tenth aspect, generation 
of the reduced digital data is carried out on the basis of the 
time required for transmission of the digital data or the storage 
capacity required for recording of the digital data, in addition 
y to the auxiliary information; and the generated reduced digital 
k 0 data is transmitted. Therefore, only a portion of original data 
D having a high degree of importance can be transmitted Lo a 
111 destination. 

%\ According to a twelfth aspect of the present invention, in 

03 the video data generation method of the tenth aspect, the 
0 generated reduced digital data is reproduced at the side where 
% the digital data is generated. Therefore, only a portion of 
original data having a high degree of importance can be 
reproduced at the end where the digital data is generated- 

According to a thirteenth aspect of the present invention, 
there is provided a video data generation method comprising: 
reducing the length of digital data including video on the basis 
of auxiliary information relating to the digital data, and 
information relating to transmission, thereby generating reduced 
digital data; and recording or transmitting the reduced digital 
data. Therefore, reduced digital data can be generated 
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considering not only the auxiliary information but also the 
information relating to transmission, and important video data 
can be generated using the generated auxiliary information, 
whereby the communication cost and the capacity required for 
recording are reduced. 

According to a fourteenth aspect of the present invention, in 
the video data generation method of the thirteenth aspect, the 
information relating to transmission is information about the 
name of a destination- Therefore, reduced digital data can be 
generated considering not only the auxiliary information but also 
the information relating to the name of the destination. 

According to a fifteenth aspect of the present invention, in 
the video data generation method of the thirteenth aspect, the 
information relating to transmission is information about the 
contents to be transmitted. Therefore, reduced digital data can 
be generated considering not only the auxiliary information but 
also the information relating to the contents to be transmitted. 

According to a sixteenth aspect of the present invention, in 
the video data generation method of the thirteenth aspect, the 
information relating to transmission is information about the 
capability of a terminal at a destination. Therefore, reduced 
digital data can be generated considering not only the auxiliary 
information but also the information about the capability of the 
terminal at the destination. 

According to a seventeenth aspect of the present invention, 
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there is provided a video data generation method comprising: 
selecting at least one. piece of digital data from plural pieces 
of digital data including video, on the basis of auxiliary 
information relating to the digital data and information relating 
to transmission; and recording or transmitting the selected 
digital data. Therefore, at least one piece of digital data can 
be selected from plural pieces of digital data according to the 
auxiliary information and the information relating to 
transmission to a destination, and the selected data can be 
transmitted to a destination. Thus, important video data is 
generated using the generated auxiliary information, whereby the 
communication cost and the capacity required for recording can be 
reduced . 

According to an eighteenth aspect of the present information, 
in the video data generation method of the seventeenth aspect, 
the information relating to transmission is information about the 
name of a destination. Therefore, at least one piece of digital 
data can be selected from plural pieces of digital data according 
to the auxiliary information and the information about the name 
of the destination, and the selected data can be transmitted to 
the destination. 

According to a nineteenth aspect of the present invention, in 
the video data generation method of the seventeenth aspect, the 
information relating to transmission is information about the 
contents to be transmitted. Therefore, at least one piece of 



digital data can be selected from plural pieces of digital data 
according to the auxiliary information and the information about 
the contents to be transmitted, and the selected data can be 
transmitted to the destination. 

According to a twentieth aspect of the present invention, in 
the video data generation method of the seventeenth aspect, the 
information relating to transmission is information about the 
capability of a terminal at a destination. Therefore, at least 
one piece of digital data can be selected from plural pieces of 
digital data according to the auxiliary information and the 
information about the capability of the terminal at the 
destination, and the selected data can be transmitted to the 
destination. 

According to a twenty-first aspect of the present invention, 
there is provided a video data playback method comprising: 
reducing the length of digital data including video, on the basis 
of auxiliary information relating to the digital data, thereby 
generating reduced digital data; and displaying the reduced 
digital data. Therefore, only a portion of the original data 
having a high degree of importance can be played at the end where 
the digital data is generated, and the time required for the 
playback can be reduced. 

According to a twenty-second aspect of the present invention, 
there is provided a video data playback apparatus for reducing 
the length of digital data including video, on the basis of 



auxiliary information relating to the digital data, thereby 
generating reduced digital data; and displaying the reduced 
digital data. Therefore, only a portion of the original data 
having a high degree of importance can be played at the end where 
the digital data is generated, and the time required for the 
playback can be reduced* 

According to a twenty-third aspect of the present invention, 
there is provided a data storage medium which stores a data 
processing program for making a computer execute the auxiliary 
information generation method according to the first aspect. By 
using this recording medium, generation of auxiliary inf ormaLioa 
can be carried out with a computer. 

According to a twenty-fourth aspect of the present invention, 
there is provided a data storage medium which stores a data 
processing program for making a computer execute the video data 
generation method according to the thirteenth aspect. By using 
this recording medium, generation of reduced video data smaller 
tiian the original data can be carried out with a computer. 

According to a twenty-fifth aspect of the present invention, 
there is provided a data storage medium which stores a data 
processing program for making a computer execute the video data 
generation method according to the seventeenth aspect. By using 
this recording medium, a process of generating video data by 
selecting at least one piece of data from the original data, can 
be carried out with a computer. 
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According to a twenty-sixth aspect of the present invention, 
there is provided a data storage medium which stores the 
auxiliary information generated by the auxiliary information 
generation method according to a first aspect. Therefore, 
generation of auxiliary information can be carried out with a 
computer and; furthermore, the auxiliary information can be 
stored separately from the original data. 

According to a twenty-seventh aspect of the present invention, 
there is provided a data storage medium which stores the video 
data generated by the video data generation method according to 
the thirteenth aspect. Therefore, selection of important video 
data from the original video data can be carried out with a 
computer, and the selected video data can be stored separately 
from the original video data. 

According to a twenty-eighth aspect of the present invention, 
there is provided a data storage medium which stores the video 
data generated by the video data generation method according to 
the seventeenth aspect. Therefore, selection of important video 
data from the original video data can be carried out with a 
computer, and the selected video data can be stored separately 
from the original video data. 

RRTEF DESCRIPTION OF THE DRAWIN GS 

Figure 1 is a block diagram illustrating a combined camera 
and digital VTR having an auxiliary information generator 
according to a first embodiment of the present invention. 



Figure 2 is a flowchart for explaining the operation of a 
CPU 11 when generating auxiliary information, according to the 
first embodiment. 

Figure 3 is a diagram, for explaining an example of auxiliary 
information which is generated by the auxiliary information 
generator according to the first embodiment. 

Figure 4 is a schematic diagram illustrating a combined 
camera and digital VTR as an example of an image 
recording/playback apparatus. 

Figure 5 is a block diagram illustrating an auxiliary 
information generator implemented by the CPU 11. 

Figure 6 is a diagram illustrating a question displayed 
according to the menu method. 

Figure 7 is a diagram illustrating a displayed question and 
a way for answering the question, according to the menu method. 

Figure 3 is a diagram illustrating another way for answering 
the question according to the menu method. 

Figure 9 is a diagram illustrating an information switch 
provided on the upper surface of the body of the combined camera 
and digital VTR. 

Figure 10 is a diagram illustrating a pressure sensor 
provided on the upper surface of the body of the combined camera 
and digital VTR. 

Figure 11 is a diagram illustrating a sweat sensor provided 
on the upper surface of the body of the combined camera and 
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digital VTR. 

Figure 12 is a block diagram illustrating an example of a 
menu input means . 

Figure 13 is a diagram illustrating another example of 
answer input according to the menu method. 

Figure 14 is a block diagram illustrating another example of 
a menu input means . 

Figure 15 Is a diagram illustrating another example of 
answer input according to the menu method. 

Figure 16 is a block diagram illustrating another example of 
a menu input means. 

Figure 17 is a diagram illustrating another example of 
answer input according to the menu method - 

Figure 18 is a block diagram illustrating a CPU having a 
button pattern moving means. 

Figure 19 is a block diagram illustrating another example of 

a menu input means . 

Figure 20 is a diagram illustrating another example of 
answer input according to the menu method. 

Figure 21 is a diagram illustrating an example a menu screen 
for selecting a rule for inputting auxiliary data. 

Figure 22 is a diagram illustrating an example of a menu 
screen for selecting a target of shooting* 

Figure 23 is a diagram illustrating an example of a menu 
screen for selecting PointOfView and its degree of importance. 
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Figure 24 is a block diagram illustrating another example of 

a menu input means . 

Figure 25 is a diagram illustrating an example a screen 

showing PointOfView d.nd its degree of importance which are 

selected from the menu. 

Figure 2 6 is a diagram illustrating another example a screen 

showing PointOfView and its degree of importance which are 

selected from the menu. 

Figure 27 is a diagram illustrating another example a screen 

showing PointOfView and its degree of importance which are 

selected from the menu- 
Figure 28 is a diagram illustrating another example a screen 

showing PointOfView and its degree of importance which are 

selected from the menu. 

Figure 29 is a block diagram illustrating the internal 

structure of a CPU which enables the screen display shown in 

figure 25. 

Figure 30 is a block diagram illustrating the internal 
structure of a CPU which enables the screen display shown in 
figure 26(a) « 

Figure 31 is a block diagram illustrating the internal 
structure of a CPU which enables the screen display shown in 
figure 26 <b) , 

Figure 32 is a block diagram illustrating the internal 
structure of a CPU which enables the screen display shown in 
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figure 27. 

Figure 33 is a block diagram illustrating the internal 
structure of a CPU which enables the screen display shown in 
figure 28. 

Figure 34 is a block diagram illustrating the internal 
structure of a CPU which controls the combined camera and digital 
VTR. 

Figure 35 is a diagram illustrating a handy phone including 
an auxiliary information generator according to a second 
embodiment of the present invention. 

Figure 3 6 is a block diagram illustrating the handy phone 
including the auxiliary information generator according to the 
second embodiment. 

Figure 37 is a block diagram illustrating a mail formation 
function and a (moving) picture attachment function, of the handy 
phone according to the second embodiment. 

Figure 38 is a diagram illustrating the state where a 
question is displayed on a liquid crystal display of the handy 
phone according to the second embodiment. 

Figure 39 is a diagram illustrating the state where a 
question and answer buttons are displayed on the liquid crystal 
display of the handy phone according to the second embodiment. 

Figure 40 is a diagram illustrating the state where the user 
puts a finger on the liquid crystal display of the handy phone 
according to the second embodiment. 
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Figure 41 is a diagram illustrating the state where the user 
applies a pen onto the liquid crystal display of the handy phone 
according to the second embodiment. 

figure 42 is a diagram illustrating the state where the user 
puts a finger on an information switch which is provided on the 
handy phone according to the second embodiment, 

Figure 43 is a diagram illustrating the state where the user 
puts a finger on a pressure sensor which is provided on the handy 
phone according to the second embodiment . 

Figure 4 4 is a diagram illustrating the state where the user 
puts a finger on a sweat sensor which is provided on the handy 
phone according to the second embodiment. 

Figure 45 is a diagram illustrating the state where the user 
applies a pen onto a touch panel which is provided on the liquid 
crystal display of the handy phone according to the second 
embodiment . 

Figure 4 6 is a diagram illustrating the state where a 
microphone provided on the back of the handy phone according to 
the second embodiment picks up a sound. 

Figure 47 is a flowchart for explaining a picture data 
generation method according to third embodiment of the present 
invention . 

Figure 48 is a flowchart for explaining a picture data 
generation method according to the third embodiment of the 
present invention- 



Figure 49 is a diagram for explaining a recording medium on 
which a program and data for making a computer perform any of the 
aforementioned embodiments are recorded, and a computer system. 

Figure 50 is a diagram illustrating an example of 
description of picture data according to the prior art. 

Figure 51 is a diagram illustrating a section where 
auxiliary information is to be inserted in picture data. 

Figure 52 is a diagram illustrating a method of using 
auxiliary information. 

OK.TATTiEP DESCRIPTION OF THE PREFERRED EMBODIMENTS 

[Embodiment 1] 

Hereinafter, an auxiliary information generation method 
according to a first embodiment of the present invention, which 
corresponds to Claims 1 and 9, will be described with reference 
to the drawings . 

In advance of describing the auxiliary information 
generation method, a method of using auxiliary information will 
be described with reference to figures 52 (a) -52 (c) . As shown in 
figure 52(a), scene A of athletic meeting, scene B of entrance 
ceremony, and scene c of travel are successively recorded as 
video data on the same recording medium* 

Amongst these video data, hatched parts Al, Bl, and Cl in 
figure 52(b) are given high values of importance as auxiliary 
information . 

From the scenes A (athletic meeting) , B (entrance ceremony) , 



anci C (travel), only the parts Al, Bl, and CI which are given 
high values of importance as auxiliary information die extracted 
and combined to form an extracted scene D, and the extracted 
scene D is stored or transmitted, whereby only the scenes of high 
degrees of importance can be efficiently recorded or transmitted 
as compared with the case where all of the shot scenes of 
athletic meeting, entrance ceremony, and travel are stored or 
transmitted as they are, resulting in reduced amount of use in 
the recording medium and reduced communication costs. The 
extraction of the scenes of high degrees of importance may be 
carried out during or after shooting of the scenes A, B, C. 

Further, with respect to music data/ since the contents of 
the music data can be easily expressed by adding data indicating 
one phrase of the music as auxiliary information, searching £or 
the music data is facilitated. 

Figure 1 is a block diagram illustrating a video 
recording/playback apparatus comprising a combined camera and 
digital VTR 1000, which includes an auxiliary information 
generation apparatus according to the first embodiment of the 
present invention. 

In figure 1, reference numeral 11 denotes a CPU as a 
controller; 12 denotes a recording/playback unit for recording or 
reproducing data in/from a recording medium 13 such as a video 
cassette, an optical disk, a magnetic disk, a memory card, or the 
like; 13 denotes a recording medium on which multimedia data such 
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as video data and audio data are recorded; 14 denotes a camera 
for shooting a picture, subjecting the picture to light-to- 
electricity conversion, and converting a sound at shooting into 
an audio signal/ 15 denotes a codec unit for performing interface 
between the camera and a monitor, and compressing or 
decompressing inputted video and audio data; 16 denotes a motor 
unit for playing back the scene during recording or the video 
after recording, such as a liquid crystal monitor or an 
electronic or liquid crystal viewfinder; and 10 denotes an 
external interface for exchanging data between this video 
recording/playback apparatus and an external device such as a PC 
or the like. 

Figure 2 is a flowchart illustrating an example of operation 
when the CPU 11 generates auxiliary information, and figure 3 is 
a diagram for explaining an example of auxiliary information 
generated by the auxiliary information generation apparatus • 

Further, figures 4(a) and 4(b) are schematic diagrams 
illustrating a combined camera and digital VTR as an example of 
the video recording/playback apparatus. 

In the combined camera and digital VTR, after a 
power/operation mode switch (power switch) 105 is turned on, a 
recording button (shooting switch) 104 is turned on, whereby the 
CPU 11 shown in figure 1 puts the combined camera and digital VTR 
into recording mode. A picture of a subject, which is formed by 
a lens 100 of the camera 14, is converted into a video signal by 
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a light-to-electricity conversion unit (not shown) such as a CCD 
included in the combined camera and digital VTR, and a sound dl 
shooting is converted into an audio signal by an internal 
microphone 102. These video signal and audio signal are encoded 
by the codec unit 15 shown in figure 1, and converted into a 
format for recording by the recording/playback unit 12, and 
thereafter, recorded on the recording medium 13 such as a video 
cassette tape, an optical disk, or the like. During the 
recording, the video signal outputted through the codec unit 15 
is displayed by the monitor 103 (i.e., an electronic or liquid 
crystal viewfinder 101 or a monitor 103 such as a liquid crystal 
monitor, which are incorporated in the combined cameral and 
digital VTR) , whereby the user can confirm whether a picture is 
actually recorded or not, and what is the recorded picture like. 

After shooting a desired picture, when the user operates the 
power /operation mode switch (operation mode switch) 105, the CPU 
11 puts the combined camera and digital VTR into fast-rewind mode, 
and detects the recording start position on the recording medium. 
Thereafter/ when the user operates the power/operation mode 
switch (operation mode switch) 105, the CPU 11 puts the combined 
camera and digital VTR into playback mode- Thereby, the 
recording/playback unit 12 reads the video data of the shot 
picture from the recording medium 13, the codec unit 15 decodes 
the video signal, and the monitor 103 plays the video. Thereby, 
the combined camera and digital VTR performs playback with the 
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monitor 103, and the user can enjoy the shot picture with the 
screen of the monitor 103 which is wider than the viewfinder 101, 
Further, the user can easily confirm whether the desire picture 
has been shot or not, or search for a point ol time where 
auxiliary information is to be inputted, or conform whether 
scenes extracted from the video data on the basis of the 
auxiliary information are as intended or not. When the extracted 
scenes are not as intended/ the user can easily perform an 
editing operation such as re-inputting of the auxiliary 
information , 

Furthermore, reference numeral 108 denotes a buttery pack 
for driving the combined camera and digital VTR; 107 denotes a 
cassette rid which covers a portion where a video cassette tape 
is mounted on the combined camera and digital VTR; 106 denotes a 
grip belt which supports a hand of the user holding the VTR at 
shooting; and 109 denotes an information button for inputting 
auxiliary information. 

Although it is not shown in figure 4, the played video can 
be displayed on an external wider monitor which is connected to 
the combined camera and digital VTR through the external 
interface 10 shown in figure 1, or the video data recorded on the 
recording medium can be transmitted to a personal computer 
through the external interface 10 to be edited on the personal 
computer . 

Hereinafter, an auxiliary information generation method will 



be described taking, as an example, the video recording/playback 
apparatus having the auxiliary information generation apparatus 
constructed as described above, with reference to figures 1, 2, 
and 3 . 

The combined camera and digital VTR having the auxiliary 
information generation apparatus shown in figure 1 has a 
construction similar to that of an ordinary combined camera and 
digital VTR, as shown in figure 4. In the combined camera and 
digital VTR, a picture shot by the camera 13 is monitored by the 
monitor 16 and, simultaneously, it is compressed by the codec 
unit 15, and recorded on the recording medium 13 through the 
recording/playback unit 12. 

This combined cameral and digital VTR is different from the 
conventional one only in the operation of the CPU 11, and the CPU 
11 can generate auxiliary information on the basis of control 
signals which are supplied from the user information button 109, 
the shooting switch 104, and the power switch 105. That is, the 
CPU 11 corresponds to the auxiliary information generation 
apparatus . 

Figure 5 is a block diagram illustrating the construction of 
the auxiliary information generation apparatus implemented by the 
CPU 11. In figure 5, reference numeral 110 denotes a menu-basis 
auxiliary information storage means which stores plural models of 
auxiliary information corresponding to different menus; 111 
denotes an auxiliary information model selection means for 
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selecting one of the auxiliary information models stored in the 
menu-basis auxiliary information storage means 110, according to 
an instruction from a menu input means 116; 112 denotes an 
auxiliary information model rewriting means lor rewriting the 
auxiliary information model selected by the auxiliary information 
model selection means 111, according to parameters supplied from 
a parameter input means 115; 113 denotes a recording timing 
control means for controlling the timing of recording the 
auxiliary information, according to an input from an auxiliary 
information recording timing input means 117; 114 denotes an 
auxiliary information storage means for writing the auxiliary 
information model which has been rewritten by the auxiliary 
information model rewriting means 112, into the recording medium, 
under control of the recording timing control means 113. 

The auxiliary information model selection means 111, the 
auxiliary information model rewriting means 112/ the recording 
timing control means 113, and the auxiliary information recording 
means 114 are parts implemented by the CPU 11 itself, the menu- 
basis auxiliary information storage means 110 is a part 
implemented by a ROM (not shown) included in the CPU 11, and the 
parameter input means 115, the menu input means 116, and the 
auxiliary information recording timing input means 117 are parts 
implemented by the user information buttons 109, the recording 
button 104 as the shooting switch, the power /operation mode 
switch 105 as the power switch, and the like. 



Figure 7 shows the operation of the CPU 11 when generating 
auxiliary information relating to digital data. It is assumed 
that the CPU 11 operates even in the stand-by state where the 
power/operation mode switch 105 of the combined camera and 
digital VTR is OFF, 

Initially, when the user turns on the power/operation mode 
switch 105 of the combined camera and digital VTR (step 21), the 
CPU 11 is notified that the power is turned on. Thereby, it is 
set by default that auxiliary information is Lo be inputted. 

Next, whether the type of auxiliary information to be 
generated should be selected or not is inputted by combination of 
the operations of the switches such as the recording button 104/ 
the power /operation mode switch 105, and the like {step 22) . 
This selection may be performed by a menu method/ that is, by 
making a question to the user with a menu displayed on the 
monitor 103. To be specific, the CPU 11 outputs a question to 
the monitor 16 as shown in figure 6, When a touch panel 103a is 
provided as shown in figure 7, the CPU 11 outputs answer buttons 
103b to the question, and displays the answer buttons on the 
monitor 16. At this time, the CPU 11 searches the ROM which 
stores the question. instead of inputting the auxiliary 
information by default, a question to the user may be made by the 
menu method to obtain an answer to the question from Lhe user. 

The user answers the question as follows* That is, as 
shown in figure 8, the user selects a menu button by performing, 



with his/her thumb, a combination of switching operations of the 
recording button 104, the power/operation mode switch 105, and 
the like, which are provided on the rear right side of the body 
of the combined camera and digital VTR, by a predetermined number 
of times, in a predetermined order. Alternatively, as shown in 
figures 9, 10, or 11, an information switch- 109/ or a pressure 
sensor 109a, or a sweat sensor 109b may be provided on the upper 
surface of the body, and the user may select a menu button by 
pressing it with the fingers of the hand that grips the combined 
camera and digital VTR, When using a sensor, as shown in figure 
12, it is necessary to normalise the sensor output by a sensor 
output normalization unit 116a, compare the normalized sensor 
output with a threshold which is generated by a threshold 
generation unit 116b, by a comparator 116c, and then output the 
comparison result to the auxiliary information pattern selection 
means 112. 

Furthermore, when the liquid phase monitor 103 is provided 
with a touch panel 103a as shown in figure 7, the user may select 
an answer by putting a finger F on an option button 103b 
displayed on the liquid crystal monitor. Further, as shown in 
figure 13, the user may select an answer by applying a pen P such 
as a plastic pen to the touch panel- in these cases, as shown in 
figure 14, in the menu input means 116, the coordinates of the 
portion on the panel which is pressed by the finger F or the pen 
P are supplied from the touch panel 103a to a coordinate position 
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input means ll6d, and a position comparison means 116e compares 
the coordinates with the positions where the option buttons 103b 
are displayed/ which positions are supplied from the CPU 11, 
whereby the selected option button is informed to the auxiliary 
information model selection means 111- Further, as shown in 
figure 15, a selected option may be inputted by hand-writing an 
answer to the question with a pen on the touch panel 103a, and 
automatically recognizing this answer. In this case, as shown in 
figure 16, the coordinates of the portion on the touch panel 103b, 
which portion is pressed by the finger F or the pen P, are 
supplied from the touch panel 103a to the coordinate position 
input means 116d. Then, a pattern matching means 11 6f recognizes 
the hand-written character string according to a standard 
character pattern which is similar to the input characters as a 
trail of points pressed by the finger or pen. Then, an answer 
candidate collation means 116g collates the candidates of answers 
to the question, which candidates are issued from the CPU 11, 
with the recognized character string to judge whether the answer 
is appropriate or not- When the answer is appropriate, the 
answer is output ted to the auxiliary information model selection 
means 111. 

Furthermore, as shown in figures 17(a) and 17(b), option 
buttons 101a may be displayed in the viewfinder 101. In this 
case, the option (menu) buttons are automatically contrast- 
inverted (highlighted) one by one, and when the highlighted 
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option button matches the option the user desires / the user 
selects this option button by appropriately combining the 
operations of the recordinq button 104, the power/operation mode 
switch 105, and the like as shown in figure 8- Alternatively, 
the user may select an option button by pressing the information 
switch 109, or the pressure sensor 119a, or the sweat sensor 119b/ 
which is provided at the upper surface of the body as shown in 
figures 9, 10, or 11, with the fingers of the hand that grips the 
video camera. The successive contract inversion of option 
buttons is realized as follows- That is, as shown in figure 18, 
a button pattern formation means 11a, a button pattern inversion 
means lib, and a button designation means 11c are implemented by 
software or the like in the CPU 11, and the contrast of a pattern 
of an option button which is generated by the button pattern 
formation means 11a is inverted by the button pattern inversion 
means lib. At this time, the option buttons to be contract- 
inverted (highlighted) are designated one by one by the button 
designation means 11c, whereby successive contrast inversion of 
the option buttons is realized. The button pattern formation 
means 11a, the button pattern inversion means lib, and the button 
designation means 11c may be implemented by hardware outside the 
CPU 11. When the user lightly presses his/her eye onto a pad 
101b of the viewfinder 101 as shown in figure 10, or winks as 
shown in figure 11, a pressure sensor (not shown) embedded in the 
pad 101b surrounding the viewfinder 101 senses this, whereby the 
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corresponding option button is designated. In this case, in the 
menu input means 216, as shown in figure 19/ a pressure sensor 
output input means 116i inputs the output from the pressure 
sensor, and an input button decision means 116 j receives, through 
a button position input means 116h/ information indicating an 
option button which is currently highlighted according to the 
output from the button designation means 11c* While the option 
buttons are highlighted one by one, when the user operates the 
pressure sensor by lightly pressing his/her eye onto the pad 101b 
at an option button he/she desires, the input button decision 
means 116 j decodes this button as an input button. Alternatively, 
a light-emitting element 116X and a light sensor 116Y which are 
included in Ihe viewfinder 101 as shown in figure 20 may be used 
instead of the pressure sensor. In this case, the light-emitting 
element 116X applies a weak light to the user's eye, and the 
light sensor 116X senses the reflected light from the eye to 
detect whether the user opens the eye or not, and the output from 
the light sensor is inputted to the sensor output input means 
11 6i, thereby designating the corresponding option. 

When the user does not select the type of auxiliary 
information, the instruction of inputting auxiliary information 
is canceled. On the other hand, when the user makes an 
instruction to select the type of auxiliary information, 
selection is carried out in step 23. At this time, the user may 
select a description to be used from descriptions of auxiliary 
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information based on or the like. However, it is 

difficult for the ordinary users to understand and execute such 
selection because the ordinary users are not expected to have 
knowledge about MPEG-7 and XML. Therefore, as shown by a menu on 
the monitor {figure 21(a)) or a menu in the viewfinder (figure 
21(b)), the rule of inputting auxiliary information is selected 
from a menu having options as follows: recording auxiliary 
information for every important scene, inputting auxiliary 
information to a header or the like every Lime the shooting 
location is changed (this operation is achieved by combination 
with the power/operation mode switch), and inputting auxiliary 
information at predetermined intervals. The processes and 
circuits required at this time are implemented by the same method 
as the menu method for deciding whether the type of auxiliary 
information should be selected or not. Alternatively, as shown 
by a menu on the monitor (figure 22(a)) or a menu in the 
viewfinder (figure 22(b)), the type of auxiliary information may 
be selected, according to the purpose, from a menu having options 
such as snap shooting, storage/record, athletic meeting, entrance 
ceremony, etc., and the selected auxiliary information may be 
converted into a set of descriptions based on MPKG-7 or the like 
in the CPU 11. Although VideoSegement is used as the set of 
descriptions, the viewpoint can be changed according to each 
purpose . 

The above-mentioned selection can be realised by storing 
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description sets based on MPEG-7 or the like in a ROM (not shown) , 
and selecting the contents in the ROM by the CPU 11 according to 
the menu option selected by the user. 

Turning to figure 2, when shooting is started (step 24} , a 
description of VideoSegment indicating a part of video is 
recorded as auxiliary information. During shooting, when the 
user desires to record auxiliary information, for example, when 
an important scene or a scene to be emphasised is being shot or 
at the instant when the scene is changed, the user presses the 
information button 109, and the CPU 11 detects it, generates 
auxiliary information, and records the auxiliary information. 
While in the example shown in figure 2 a description of 
VideoSegment and a description of importance are recorded in 
steps 27 and 28, respectively, the present invention is not 
restricted thereto. The auxiliary information selected in step 
23 may be recorded. Furthermore, a plurality of information 
buttons, which are respectively assigned to different kinds of 
auxiliary information, may be provided on the body of the 
combined camera and VTR. In this case, the user presses any of 
these information buttons to record the corresponding auxiliary 
information. This operation is repeated until shooting is 
completed- 

Figure 3 shows an example of auxiliary information generated 
according to the first embodiment, and MPEG-7 is employed for the 
description. In this first embodiment, in contrast with the 
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conventional method, XML description is employed. In figure 3, a 
description of Medialnf ormation between descriptors 30] a and 301b 
describes the entire information of this video data. That is, it 
indicates that the file format of this video data is MPEG4 - 
SegmentDecomposition shown by a descriptor 302 indicates that the 
VideoSegment, which is a part of the video data, is temporal , 
i.e., that the VideoSegment is arranged temporally. The above- 
mentioned description is generated and recorded after the power 
is turned on or before Segment description is started. In figure 
3, a description between descriptors 303a and 303b is a 
description relating to one segment (= one scene) , and it 
indicates that one scene continues for 1M19S (i.e., one minute 
and nineteen seconds) at 30F (i.e., 30 frames per sec.)- The 
VideoSegment is followed by generated data, and a title indicated 
by a descriptor 304 and the like should be inputted not during 
shooting but after shooting, A description of PointOfview 
between descriptors 305a and 305b indicates the degree of 
importance, and it is expressed by a value for each Viewpoint as 
shown by a descriptor 306. 

Generally, PointOfview indicates a point for discrimination 
from another object. The user may input all of the auKiliary 
information toy XML description, or XML description may be 
automatically generated by preparing plural menus on assumption 
of user conditions, and selecting a menu most appropriate to the 
shooting condition from the menus. For example, when shooting a 
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scene in which a son and a daughter appear in an athletic meeting, 
as shown in figure 23(a) (menus on the monitor) or figure 23(b) 
(menus in the viewf inder) , in a menu of "athletic meeting", 
buttons 103m, 103n, 101m, lOln corresponding to tags of "son", 
"daughter", "exciting" are prepared in advance, and a value, i.e., 
the degree of importance, is shown according to the time length 
of a scene where the son and the daughter appear. Further, 
"exciting" means a climax, and this description can be recorded 
by operating the information button or the like provided on the 
equipment. The simplest method of setting the value of exciting 
is setting the value of importance at either "0" (~ not 
important) or 11 1" (-« most important) * However, one of the value 
buttons shown in the menu on the monitor or the menu in the 
viewf inder may be selected by the menu method, i,e„, in the same 
manner as that described for the case of answering to a question 
as to whether the type of auxiliary information should be 
selected or not. At this time, values in increments of "0.1" may 
be inputted between "0" and "1" by combination of pressing the 
power key and the recording key, and one of these values may be 
selected » Alternatively, when an exciting button is provided at 
the upper surface of the body of the combined camera and VTR, the 
user may input a value of exciting by operating this button. 
Furthermore, a value of exciting may be inputted by sensing the 
degree of exciting of the user from the fingers of the user which 
are put on a pressure sensor or a sweat sensor provided at the 
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upper surface of the body. Furthermore, as shown in figure 24, 
the loudness of cheers or the loudness of user T s voice at 
shooting may be measured, and the value of exciting may be 
inputted according to the measured value, in this case, the 
level of an audio signal obtained by a microphone (not shown) of 
the camera 14 is detected by an audio signal level detecting 
means 116k, and this level is normalized by an audio signal level 
normalization means 1161. Then, a comparison means 116n compares 
the normalized level with plural thresholds which are generated 
by a threshold generator 116m, whereby a value to be designated 
as a menu input can be selected automatically. 

As described above, since auxiliary information is generated 
in connection with the shooting operation , the user can easily 
generate auxiliary information at shooting. Further, PointOfView 
may be inputted after shooting. 

The degree of importance or the value of exciting inputted 
as described above can be displayed singly on the liquid crystal 
monitor or the view pointer, or it can be superimposed on the 
monitor picture during shooting, whereby the user can confirm the 
inputted value. Further, an icon corresponding to the contents 
of a message may be colored with a color according to the degree 
of importance, or the lightness in color of the icon may be 
increased according to the value, whereby the user can confirm 
the inputted value. 

For example, as shown in figure 25, the importance of 
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specified persons or the importance of exciting may be displayed 
on the monitor 103 or the viewfinder 101 by switching the screen 
from the scene being shot, or it may be superimposed on the scene 
being shot. Furthermore, as shown in figure 27, specified 
persons may be indicated by icons S and D, and the importance of 
excitiny may be indicated by the color of the icons. Figure 
27 (a) shows the case where the degree of importance is low, and 
figure 27(b) shows the case where the degree of importance is 
high. Further, as shown in figure 28, the importance of exciting 
may be indicated by the brightness in color of the icons. Figure 
28(a) shows the case where the degree of importance is low, and 
figure 27(b) shows the case where the degree of importance js 
high. 

The screen display shown in figure 25 is realized as follows* 
As shown in figure 29, the CPU 11 is provided with a display 
message generation means lid, a display message rewriting means 
lie, and a screen output means llf, and parameters in a typical 
display message which is generated by the display message 
generation means lid (in figure 25, "son", "daughter", "0.7") are 
rewritten by the display message rewriting xtieans lie and, 
thereafter, the rewritten display message is outputted to the 
monitor 16 by the screen output means llf. The display message 
generation means lid, the display message rewriting means lie, 
and the screen output means llf may be implemented by hardware 
outside the CPU 11- A screen display shown in figure 26(a) is 
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realized as follows. As shown in figure 30, a superimposing 
means llg is placed between the display message rewriting means 
lie and the screen output means llf shown in figure 29, and a 
message "persons; son and daughter, degree of importance - 0.7" 
is superimposed on the picture being shot, which is outputted 
from the codec unit 15, Further, a screen display shown in 
figure 26(b) is realized as follows. As shown in figure 31, the 
picture being shot, which is outputted from the codec unit 15, is 
scaled down by a screen scale-down means llh, and a message 
screen outputted from the display message rewriting means lie is 
moved to the lower part of the monitor screen by a message moving 
means Hi. Then, these screens are composited by a screen 
composition means Hj, and the composite screen is outputted to 
the monitor 16 by the screen output means llf. The screen 
display shown in figure 27 is realized as follows. As shown in 
figure 32, an icon corresponding to a message is selected by an 
icon selection means 111 from an icon storage means Ilk which 
stores plural icons corresponding to the menu, and the selected 
icon is colored according to the degree ot importance by an icon 
coloring means 11m, and the colored icon is outputted to the 
monitor 16 by the screen output means lln. Further, the screen 
display shown in figure 28 is realized as follows. As shown in 
figure 33, using an icon contrast changing means Ho instead of 
the icon coloring means 11m shown in figure 32, the contrast of 
the icon is changed according to the degree of importance. 



By the way, in the example shown in figure 3, since the 
appearance time of "son 11 is only "0.2" in the initial video 
segment. SegO while the appearance time of "daughter" is "0-6", 
this scene lacks interest, and therefore, the value of exciting 
is set at "0,1", In the next video segment Segl, since both 
"son" and "daughter" appear for the same appearance time "Q.6 ,T , 
this scene is interesting, and therefore, the value of exciting 
is set at "0.8". Although, in the above description, the degree 
of importance is the appearance time in one scene, it is also 
possible to express the degree of importance by the value of 
exciting or the value of "son" or "daughter" (frequency of 
appearance) . Further, the degree of importance may be determined 
by combining the sizes of "son" and "daughter" on the screen and 
the values of plural viewpoints. Further, although "son" and 
"daughter" are selected as a sub menu of a menu "athletic 
meeting' 1 , these may be selected as examples of viewpoints, and 
this selection can be executed by selecting a value button in 
PointOfView. 

As described above, since a menu is selected according to 
the shooting condition and required parameters are selected from 
the menu, auxiliary information can be generated without 
necessity of knowledge about XML rule, and the generated 
auxiliary information can be attached to the original shot data- 

That is, when the user selects a menu through the menu input 
means 116 shown in figure 5, the auxiliary information pattern 



selection mean* ill selects one of the auxiliary information 
patterns which correspond to different menus and are stored in 
the menu-basis auxiliary information storage means 110, In the 
example of figure 3, <PointOfView Viewpoint="son">, <PointOfview 
Viewpoint="daughter t! >, <PointOfView Viewpoint-"exciting">, . . . 
correspond to the patterns. In the auxiliary information pattern 
corresponding to the selected menu, <a portion corresponding to a 
variable should be rewritten- That is, this portion is selected 
by selecting a tag prepared in the menu, and the auxiliary 
information pattern rewriting means 112 changes the variable in 
the auxiliary information pattern to the information specified by 
the user, according to the tag, thereby completing the auxiliary 
intormation. in the example of figure 3, the variable is <Value>, 
and this is changed to the value specified by the user (e.g., 
"0.6", "0.8", etc.). Thereafter, as shown in figure 5, the 
auxiliary information recording means 114 records the completed 
auxiliary information in the header section of a scene or the 
like according to a Liming specified by the user through the 
auxiliary information recording timing input means 117* The 
recording timing control means 113 controls the recording means 
114 so that the auxiliary information is recorded in the header 
section of a scene corresponding to the timing specified by the 
user . 

Hereinafter, a description will be given of the case where 
the combined camera and VTR reproduces the recorded video data, 
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using the auxiliary information recorded as described above. As 
shown in figure 34, an auxiliary information detection means lip 
detects the auxiliary information from a signal which is 
reproduced from the recording medium by the recording/playback 
unit 12, and an auxiliary information judgement means llq judges 
whether or not the degree of importance of the auxiliary 
information is larger than a value set by the user, for example 
"0.5". When the degree of importance is larger than the set 
value, a recording medium drive control means l]r sets the 
recording medium playback speed of a recording medium drive means 
{not shown) at "normal playback" , and a playback control means 
lis controls the codec unit 15 so as to decode a signal 
reproduced at this time. On the other hand, when the degree of 
importance is smaller than "O.b" which is set by the user, the 
recording medium drive control means llr sets the recording 
medium playback speed of the recording medium drive means (not 
shown) at "fast-forward mode ,, / and the playback control means lis 
controls the codec unit 15 so as not to decode the reproduced 
signal, whereby playback skipping of a section having a low 
degree of importance is achieved. 

As described above, according to the first embodiment of the 
present invention, in the combined camera and digital VTR/ 
parameters in a model of auxiliary information can be inputted by 
a method which is familiar to the ordinary users, without 
premising knowledge about MPEG-7 which cannot be expected from 
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the ordinary users. For example, the parameters are inputted by 
combining operations of the recording switch, the power button, 
and the like which are operated in synchronization with shooting 
operation by the user, by operating the buttons of menu options 
which are displayed on the monitor, by operating the information 
switch which is provided on the body of the VTR, by touching or 
pen-writing on the liquid crystal monitor, or by sensing the 
user's eye against the viewfinder. Thereby, the user can input 
the auxiliary information, and can easily obtain index 
information whesn the shot moving picture is played back later. 

While in this first embodiment whether auxiliary information 
should be selected or not is decided by the user every time the 
power button is turned on, the present invention is not 
restricted thereto. Whether auxiliary information should be 
selected or not may be set independently of turn-on of the power 
button. For example, in "VideoSegment" , only the values such as 
"View" which are likely to change during shooting may be 
generated at shooting while other values are generated in advance 
of shooting. Also in this case, the same effects as described 
above are achieved. Further, while in this first embodiment 
auxiliary information is generated in synchronization with the 
shooting button, there is a case where a camera or the like is 
set such that the power is turned off manually or automatically 
for long battery life* In order to cope with such case, 
auxiliary information may be generated in synchronization with 



on/off of the power. 

Further, while in this first embodiment the descriptions of 
XML or the like are explained for only several kinds of 
parameters, the present invent j on is not restricted thereto, and 
it is possible to select desired parameters from a menu according 
to the purpose. Further, although the first embodiment is 
described on the premise that a shot picture is recorded; a shot 
picture is not necessarily recorded, and it can be used also when 
compressed video and audio data are transmitted as they are to be 
used on a network or the like. Furthermore, although auxiliary 
information is generated at shooting, it is also possible to 
generate auxiliary information at playback by using the 
information button at the time of playback, i.e., when playing a 
picture shot by the combined camera and VTR itself to confirm the 
picture. when recording or transmitting the generated auxiliary 
information, it is decided, according to the construction of the 
device or system, as to whether the auxiliary information should 
be recorded/transmitted after being multiplexed in the shot video 
and audio data, or it should be stored in another place for 
recording/transmission so that the auxiliary information can be 
recorded/transmitted independently of the shot video and audio 
data - 

Furthermore, it is also possible to detect a section in the 
video data corresponding to a degree of importance specified by 
the user, and reproduce only this section to be displayed on the 
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monitor. Therefore, the CPU can extract only data having a high 
degree of importance from the video data recorded on the 
recording medium, and reproduce the extracted data, whereby 
confirmation of the recorded data by the monitor can be carried 
out with efficiency- Further, the user can enjoy the recorded 
"work" without feeling tired, and the power consumption is 
reduced to secure more driving time. 
[Embodiment 2] 

Hereinafter, an image data generation method according to a 
second embodiment of the present invention, which corresponds to 
Claim 9, will be described with reference to the drawings. 

The image data generation method according to the second 
embodiment will be described takjng, as an example, a case where 
an auxiliary information generation apparatus is contained in a 
handy phone having a movie function or a camera function. 

Figure 35 shows a handy phone having a movie function or a 
camera function. The handy phone has a lens 206 for shooting on 
the back of its body, and a light-to-electricity converter (not 
shown) inside the body, whereby a shot (moving) picture can be 
attached to a mail or the like, and transmitted through a mobile 
communication network. The sound during picture shooting is 
received by a microphone 207. In figure 35, reference numeral 
200 denotes a microphone for conversation, 201 denotes a ten key, 
202 denotes a function key, 203 denotes a liquid crystal display, 
204 denotes a speaker for conversation, and 205 denotes a whip 
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antenna. 

Figure 3 6 is a block diagram of a handy phone 200 having an 
auxiliary information generation apparatus (CPU 11) according to 
the second embodiment of the invention* This handy phone has a 
movie function or a camera function. In figure 3G, the same 
reference numerals as those shown in figure 1 denote the same or 
corresponding parts. Further, reference numeral 19 denotes an 
antenna, 18 denotes an RF front end for processing a high- 
frequency signal which is received/transmitted by the antenna 19, 
and 17 denotes a modulation/demodulation unit for 
modulating/demodulatiny the sound from the microphone, and 
outputting an audio signal to the speaker- 

The operation of the handy phone from generation of 
auxiliary information to generation of video data is identical to 
that described for the combined camera and digital VTR having the 
auxiliary information generator according to the first embodiment 
of the invention* In the handy phone, auxiliary information 
relating to digital data is generated and attached to (moving) 
picture data shot by the camera function during shooting or after 

shooting, and only a portion of the digital data having a high 
priority is extracted using the auxiliary information to generate 

reduced digital data, and the reduced digital data so generated 

is attached to a mail or the like to be transmitted, whereby the 

communication cost is reduced. 

Since the function of generating and adding auxiliary 
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information and the playback operation by itself are identical to 
those described for the first embodiment , repeated description is 
not necessary* However , the recording medium 13 shown in figure 
3 6 is limited to a semiconductor memory and, in this case, the 
recording medium drive control means llr is a memory address 
jumping means. Further, the playback control monitor 16 is 
limited to a liquid crystal display. 

Figure 37 i s a block diagram illustrating the construction 
for implementing mail formation and (moving) picture attachment. 
To be specific, in figure 37 , when the user operates the ten key 
201, a mail document formation means lit forms a mail document by 
appropriately selecting characters and numbers from plural 
characters and numbers which are assigned to each key. As 
described above, auxiliary information is added to picture data 
which has been obtained by shooting a picture with the lens 206 
and subjecting the picture to light-to-electricity conversion by 
the light-to-electricity converter (not shown) such as a CCD- A 
transmission picture data formation means llu forms picture data 
t.o be transmitted (hereinafter, referred to as transmission 
picture data) by extracting, from the obtained picture data, only 
a section where the degree of importance of the auxiliary 
information is higher than a predetermined value. A data 
attachment means llv converts this transmission picture data into 
a file to be attached to the mail document, A protocol 
conversion means llw converts the protocol so that the mail 
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document to which the transmission picture data is attached can 
be outputted to a network such as the Internet, 

In this second embodiment, decision as to whether auxiliary 
information should be generated or not, decision as to whether 
auxiliary information should be selected or not, selection of 
persons, inputting of the value of importance of exciting, and 
the like are carried out in the same manner as described for the 
combined camera and VTR according to the first embodiment. To be 
specific, when a menu is displayed on the liquid crystal display, 
the user selects an option button with the function key or the 
user information button, or the user selects an option button by 
putting a finger on a portion corresponding to the option button 
on a touch panel which is provided on the liquid crystal display. 
Alternatively, when only a question is displayed on the liquid 
crystal display, the user inputs an answer to this question by 
using the ten key or the function key, or the user inputs an 
ancwer by hand-writing on the touch panel, or the user selects an 
answer by putting a finger on a pressure sensor or a sweat sensor 
which is provided on the body of the handy phone. Further, an 
answer to the question may be selected by detecting the level of 
cheers or user's voice which is picked up by the microphone* 

That is, also in the handy phone, it is possible to input 
auxiliary information by combination of operations of the 
recording switch, the power button, and the like which are 
assigned to the ten key 201 and the function key 202 operated in 



synchronization with the shooting operation of the user. At this 
time, a question is displayed on the liquid crystal display 203 
as shown in figure 38, or answer buttons to the question are 
displayed on the touch panel 203a as shown in figure 39, and the 
user can select an answer to the question by applying the finger 
4 or pen P onto a portion corresponding to a desired menu button 
displayed on the liquid crystal display 203 as shown in figure 40 
or 41. Alternatively, it is possible to input auxiliary 
information by operating the information switch 209 provided on 
the body of the handy phone as shown in figure 42, or by sensing 
the user's hand holding the body with the pressure sensor 209a or 
the sweat sensor 209b shown in figure 43 or 44, or by direct 
hand-writing onto the touch panel of the liquid crystal display 
203 as shown in figure 45, or by detecting the level of the 
cheers or the user's voice which is picked up by the conversation 
microphone 207, In this way, the user can easily input auxiliary 
information by inputting some parameters using any of the above- 
mentioned methods which are familiar to the ordinary users, 
without premising knowledge about MPEG-7 that cannot be expected 
from the ordinary users, and furthermore, the user can easily 
obtain index information (auxiliary information) when the shot 
moving picture is played back later. 

As described above, according to the second embodiment of 
the present invention, in the handy phone, auxiliary information 
is inputted by inputting some parameters using a method that is 



48 



familiar to the ordinary users, without premising knowledge of 
MPEG-7 that cannot be expected from the ordinary users, which 
method js, for example, combination of operations of the 
recording switch, the power button, and the like -which are 
assigned to the ten key 201 or the function key 202 operated in 
synchronization with the shooting operation of the user; user 
operation of putting a finger or a pen onto a portion 
corresponding to a desired option button of a menu displayed on 
the touch panel of the liquid crystal display 203; user operation 
on the information switch which is provided on the body of the 
handy phone; or user operation of touching or hand-writing on the 
liquid crystal monitor. Therefore, the user can easily input 
auxiliary information, and obtain index information (auxiliary 
information) when the shot moving picture is played back later. 

Also in this second embodiment, as in the first embodiment, 
whether auxiliary information should be selected or not may be 
selected by the user every time the user turns on the power 
button, or it may be set independently of turn-on of the power 
button* 

Further, auxiliary information may be generated in 
synchronization with the shooting button* When the handy phone 
is set such that the power is turned off manually or 
automatically for long battery life, in order to cope with this 
setting, auxiliary information may be generated in 
synchronization with power on/off. 
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Further, as already described for the first embodiment , the 
descriptions of XML or the like are not limited to the above- 
described several kinds of parameters/ and it is possible to 
select desired parameters from a menu according to the purpose. 
Further, although the second embodiment is described on the 
premise that a shot picture is transmitted, a shot picture is not 
necessarily transmitted, and it can be used also when compressed 
video and audio data are recorded as they are. Furthermore, 
although auxiliary information is generated at shooting, it is 
also possible to generate auxiliary information even at playback 
by using the information button at the time of playback, i.e., 
when playing a picture shot by the handy phone itself of this 
second embodiment to confirm the picture. When recording or 
transmitting the generated auxiliary information, it is decided, 
according to the construction of the device or system, as to 
whether the auxiliary information should be recorded/transmitted 
after bp.ing multiplexed in the shot video and audio data, or it 
should be stored in another place for recording/transmission so 
that the auxiliary information can be recorded/transmitted 
independently of the shot video and audio data. 

Furthermore, it is also possible to detect a section in the 
video data corresponding to a degree of importance specified by 
the user, and reproduce only this section by the handy phone 
itself to be displayed on the liquid phase display. Therefore, 
the CPU can extract only data having a high degree of importance 
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from the video data recorded on the recording medium, and 
reproduce the extracted data, whereby confirmation of the 
recorded data by the monitor can be carried out with efficiency. 
Further, the user can enjoy the recorded "work" without feeling 
tired, and the power consumption is reduced to secure more 
driving time. 

Furthermore/ it is possible to record a value of a viewpoint 
which has previously been determined, by using the information 
button. When the ten key is used for inputting the value of the 
viewpoint, it should be instructed in advance with the function 
key or the like, 
f Embodiment 3] 

Figure 47 is a flowchart for explaining an example of an 
image data generation method according to a third embodiment of 
the present invention, which corresponds to Claims 13 and 17. it 
is assumed that the flowchart shown in figure 47 is executed by a 
control CPU which is embedded in a handy phone or the like- 

Figure 47 shows an example of a method for extracting 
(moving) video and audio data to be transmitted so that video and 
audio data as much as possible can be transmitted at a telephone 
charge lower than specified, when video and audio data to which 
auxiliary information generated by the auxiliary information 
generator according to the second embodiment is attached, is 
transmitted by a handy phone or the liJce- 

In figure 47, a destination and contents to be transmitted 
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are selected in steps 41 and 42. Thereafter, a telephone charge 
is set in step 43, and a length L of contents which can be 
transmitted at the set telephone charge is calculated in step 44. 
Since video and audio contents are usually data-compressed/ the 
length L corresponds to the length of the compressed data. 
However, the real time of video and audio can be easily obtained 
by converting the length L to the data size before compression. 
The video and audio data is divided into sub-sections called 
,f segments" according to the auxiliary information. Then, an 
initial value Pr of priority is set in step 4b, and a priority of 
a target segment is calculated in step 46. Thereafter, in step 
47, the calculated priority is compared with the initial value Pr 
by utilizing priority information which is included in the 
auxiliary information for each segment. For example, in figure 3, 
in a description of PointOfView (viewpoint description) , a value 
where ViewPoint="exciting" is extracted and compared with the Pr. 
Although in this example the priority is calculated from the 
value of one description, when there are plural descriptions of 
priority values, the corresponding priorities are derived by 
using a predetermined conversion expression, and a representative 
priority is determined and, thereafter, the representative 
priority is compared with the Pr. When the derived priority of 
the target segment is larger than the set value Pr, this segment 
is selected in step 48. When it is judged that the above- 
mentioned steps have been completed (step 49) and that the length 



of the selected segment is shorter than the set data length L 
(step 4 91), it is confirmed that at least one segment is selected 
(step 4 93) to end the process. 

On the other hand, when it is judged in step 491 that the 
total of the lengths of the selected segments is longer than the 
data length L, the priority set value Pr is incremented in step 
4 92, and the same operation as mentioned above is repeated. For 
example, in the case where the total of the lengths at the 
segments, which are selected when the priority set value Pr is 
"0*5"/ is longer than the data length L which can be transmitted 
at the predetermined telephone charge, an increment "0.1" is 
added to the priority set value Pr to make it "0.6 M , whereby the 
number of segments to be selected is reduced. This operation is 
repeated until the total of the segment lengths falls within the 
data length L which can be transmitted at the predetermined 
telephone charge. In this way, the priority set value Pr is 
increased in predetermined increments such as "CM", and a 
priority set value Pr, at which the total of the segment lengths 
becomes lower than the data length L, is detected. Thereby, the 
total of the segment lengths falls within the predetermined data 
length L, and only the data having a high degree of importance 
can be collected. 

Since the above-mentioned processes are carried out using 
the auxiliary information, the video and audio data are not 
directly handled. Therefore, the processing load falls within a 
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sufficiently allowable range. 

As described above, in the third embodiment of the invention, 
according to an upper limit of a telephone charge that is set by 
the user, an allowable calling time is determined within this 
charge, Then, a priority level is set, and the priority level is 
varied so that the total of segments whose priorities are higher 
than the set priority, approaches/ as close as possible, a time 
whose upper limit is the calling time. Therefore, only important 
segments, i.e., important video and audio data, can be selected 
as many as possible within the range of the predetermined 
telephone charge, and these segments can be inputted. 
[Embodiment 41 

Figure 48 is a flowchart for explaining an example of a 
video data generation method according to a fourth embodiment of 
the present invention, which corresponds to Claims 13 and 17. It 
is assumed that the flowchart shown in figure 48 is executed by a 
control CPU which is embedded in a handy phone or the like. 

Figure 48 shows a video data generation method based on the 
premise that generated video data is attached to a mail. 
Initially, a mail address and a title are set in step 51 and 52, 
respectively. Thereafter, in step 53, information relating to 
the preference of a person to which the mail is directed (fox 
example, a description of UserPrefercnce in MPEG-7), which 
information is stored in the handy phone , is extracted from the 
data base according to the mail address, and a priority Py is set 
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in step 54. Simultaneously, a keyword is extracted from the 
title in step 55. Next/ in steps 56, 57, 58, and 59, a segment 
whose priority is higher than the Py or a segment including a 
keyword in the description of viewpoint or the title of 
VideoSegment is selected from the selected contents* When, in 
step 591, it is judged that all of the segments have been 
subjected to the checks in steps 56 and 58, only a part relating 
to the title or the preference of the receiver of the mail is 
attached to the mail to be transmitted. 

As described above, according to the fourth embodiment, the 
taste or preference of the receiver is decided according to the 
mail address of the receiver, and a degree of importance is 
decided from the taste or preference, and segments whose degrees 
of importance are higher than the decided degree of importance 
are collected to be transmitted to the receiver. Therefore, only 
an important part of the contents can be transmitted, whereby the 
telephone charge can be reduced at both the transmitter and the 
receiver . 

While in figure 48 segments to be transmitted are selected 
according to the address and the title, segments to be 
transmitted may be selected from the keyword in the contents of 
the mail document, or the frequency of occurrence of the keyword. 
Further, although in figure 48 the data length is not limited, 
when figure 48 is combined with figure 40, further reduction in 
telephone charge can be achieved. 



While in the third and fourth embodiments segments in 
contents are selected, the present invention is not restricted 
thereto. The present invention is also applicable to the case 
where desired contents arc selected from plural contents or from 
all of already-recorded contents* 

Furthermore, although the priority of each segment is 
calculated using the degree of importance or preference, the 
present invention is not restricted thereto. For example, 
information about the capability of the terminal at destination 
or other information such as length, title, and the like can also 
be used by converting it into the priority. 

Furthermore, although the user stores the preferences of 
mail receivers in the database in the handy phone, when such 
database exists on the network, the user need not have the 
database but can access the external database as necessary. 

Furthermore, it is possible to constitute a database by 
attaching data of your preference or data of the capability of a 
terminal at your end to a mail when transmitting the mail. 

Furthermore, while the third and fourth embodiments are 
described lor the case where video and audio data are transmitted, 
the present invention is also applicable to the case where video 
and audio data having a predetermined length are recorded on a 
recording medium. 

Furthermore, the auxiliary information generation apparatus 
according to any of Lhe first to fourth embodiments can be 
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implemented by a computer as shown in figure 49. Figure 4 9 is a 
diagram illustrating a recording medium 61 on which computer 
program and data are recorded, and a computer system 62. It is 
assumed that the recording medium 61 is a semiconductor memory 
card. The procedure shown in figure 2, 47, or 4 8 is implemented 
by a program/ and the program is recorded on the recording medium 
61, whereby the program can be ported to the computer system 62 
to be executed. Further, the same effects as those achieved by 
the aforementioned embodiments can be obtained by writing and 
reading the data itself in/from the recording medium. 

while in the aforementioned embodiments a video tape and a 
semiconductor memory are used as data recoding media, a floppy 
disk or an optical disk such as CD-R, CD-RW, MO, MD, or DVD may 
be employed as long as it has a sufficient capacity. 

Furthermore, while in the first and second embodiments a 
combined camera and digxtal vtR is taken as an example, a 
portable VTR or stationary VTR having a separated camera may be 
employed. 

Moreover, while in the third and fourth embodiments a handy 
phone with a camera function is taken as an example, a PDA 
(Persona Digital Assistants) or a portable game machine may be 
employed as long as it is provided with a camera function or a 
camera can be connected to it. 



